Redact Sensitive Information within Scanned Documents using OCR and Pattern Recognition
A whitepaper for Software Engineers
Contact Support    Contact Sales
OCR combined with a powerful approximate regular expression engine can capture and search data from text on images that would otherwise be lost. Even in today’s digital age, many companies still rely on paper documents. In order to bridge the gap, Optical Character Recognition (OCR) captures the data on those paper documents and brings that data into the digital workspace. OCR technology is very useful in a number of different instances and you can create solutions that are even more powerful by adding regular expression search with approximate matching to the OCR technology. Searchable document creation, capturing bank check amounts, getting dollar amounts from an invoice, redaction of sensitive data, and indexing documents for subsequent search are just a few of the typical uses for OCR and regular expression search.

In this article, we review some of the existing problems where this technology can provide a solution. We also give an overview of the technology used to create solutions for these problems. Finally, we demonstrate the power of this combined technology by implementing one of the use cases.

White Paper Source Code


OCR Xpress™ v2 - The Full-Page OCR SDK

More Info Download


Sitemap | © 2008 Pegasus Imaging Corporation. All Rights Reserved. | Privacy Statement.